10 research outputs found
Learning From Drift: Federated Learning on Non-IID Data via Drift Regularization
Federated learning algorithms perform reasonably well on independent and
identically distributed (IID) data. They, on the other hand, suffer greatly
from heterogeneous environments, i.e., Non-IID data. Despite the fact that many
research projects have been done to address this issue, recent findings
indicate that they are still sub-optimal when compared to training on IID data.
In this work, we carefully analyze the existing methods in heterogeneous
environments. Interestingly, we find that regularizing the classifier's outputs
is quite effective in preventing performance degradation on Non-IID data.
Motivated by this, we propose Learning from Drift (LfD), a novel method for
effectively training the model in heterogeneous settings. Our scheme
encapsulates two key components: drift estimation and drift regularization.
Specifically, LfD first estimates how different the local model is from the
global model (i.e., drift). The local model is then regularized such that it
does not fall in the direction of the estimated drift. In the experiment, we
evaluate each method through the lens of the five aspects of federated
learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and
Efficiency. Comprehensive evaluation results clearly support the superiority of
LfD in federated learning with Non-IID data
Classification of Radiology Reports Using Neural Attention Models
The electronic health record (EHR) contains a large amount of
multi-dimensional and unstructured clinical data of significant operational and
research value. Distinguished from previous studies, our approach embraces a
double-annotated dataset and strays away from obscure "black-box" models to
comprehensive deep learning models. In this paper, we present a novel neural
attention mechanism that not only classifies clinically important findings.
Specifically, convolutional neural networks (CNN) with attention analysis are
used to classify radiology head computed tomography reports based on five
categories that radiologists would account for in assessing acute and
communicable findings in daily practice. The experiments show that our CNN
attention models outperform non-neural models, especially when trained on a
larger dataset. Our attention analysis demonstrates the intuition behind the
classifier's decision by generating a heatmap that highlights attended terms
used by the CNN model; this is valuable when potential downstream medical
decisions are to be performed by human experts or the classifier information is
to be used in cohort construction such as for epidemiological studies
Boosting Convolutional Neural Networks' Protein Binding Site Prediction Capacity Using SE(3)-invariant transformers, Transfer Learning and Homology-based Augmentation
Figuring out small molecule binding sites in target proteins, in the
resolution of either pocket or residue, is critical in many virtual and real
drug-discovery scenarios. Since it is not always easy to find such binding
sites based on domain knowledge or traditional methods, different deep learning
methods that predict binding sites out of protein structures have been
developed in recent years. Here we present a new such deep learning algorithm,
that significantly outperformed all state-of-the-art baselines in terms of the
both resolutions\unicode{x2013}pocket and residue. This good performance was
also demonstrated in a case study involving the protein human serum albumin and
its binding sites. Our algorithm included new ideas both in the model
architecture and in the training method. For the model architecture, it
incorporated SE(3)-invariant geometric self-attention layers that operate on
top of residue-level CNN outputs. This residue-level processing of the model
allowed a transfer learning between the two resolutions, which turned out to
significantly improve the binding pocket prediction. Moreover, we developed
novel augmentation method based on protein homology, which prevented our model
from over-fitting. Overall, we believe that our contribution to the literature
is twofold. First, we provided a new computational method for binding site
prediction that is relevant to real-world applications, as shown by the good
performance on different benchmarks and case study. Second, the novel ideas in
our method\unicode{x2013}the model architecture, transfer learning and the
homology augmentation\unicode{x2013}would serve as useful components in
future works.Comment: Updates in version 2: author order change (making it clear that
Bonggun Shin is the corresponding author
Phase-shifted Adversarial Training
Adversarial training has been considered an imperative component for safely
deploying neural network-based applications to the real world. To achieve
stronger robustness, existing methods primarily focus on how to generate strong
attacks by increasing the number of update steps, regularizing the models with
the smoothed loss function, and injecting the randomness into the attack.
Instead, we analyze the behavior of adversarial training through the lens of
response frequency. We empirically discover that adversarial training causes
neural networks to have low convergence to high-frequency information,
resulting in highly oscillated predictions near each data. To learn
high-frequency contents efficiently and effectively, we first prove that a
universal phenomenon of frequency principle, i.e., \textit{lower frequencies
are learned first}, still holds in adversarial training. Based on that, we
propose phase-shifted adversarial training (PhaseAT) in which the model learns
high-frequency components by shifting these frequencies to the low-frequency
range where the fast convergence occurs. For evaluations, we conduct the
experiments on CIFAR-10 and ImageNet with the adaptive attack carefully
designed for reliable evaluation. Comprehensive results show that PhaseAT
significantly improves the convergence for high-frequency information. This
results in improved adversarial robustness by enabling the model to have
smoothed predictions near each data.Comment: Conference on Uncertainty in Artificial Intelligence, 2023 (UAI 2023
Improving Evidential Deep Learning via Multi-Task Learning
The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks